8 research outputs found

    Audio source separation for music in low-latency and high-latency scenarios

    Get PDF
    Aquesta tesi proposa m猫todes per tractar les limitacions de les t猫cniques existents de separaci贸 de fonts musicals en condicions de baixa i alta lat猫ncia. En primer lloc, ens centrem en els m猫todes amb un baix cost computacional i baixa lat猫ncia. Proposem l'煤s de la regularitzaci贸 de Tikhonov com a m猫tode de descomposici贸 de l'espectre en el context de baixa lat猫ncia. El comparem amb les t猫cniques existents en tasques d'estimaci贸 i seguiment dels tons, que s贸n passos crucials en molts m猫todes de separaci贸. A continuaci贸 utilitzem i avaluem el m猫tode de descomposici贸 de l'espectre en tasques de separaci贸 de veu cantada, baix i percussi贸. En segon lloc, proposem diversos m猫todes d'alta lat猫ncia que milloren la separaci贸 de la veu cantada, gr脿cies al modelatge de components espec铆fics, com la respiraci贸 i les consonants. Finalment, explorem l'煤s de correlacions temporals i anotacions manuals per millorar la separaci贸 dels instruments de percussi贸 i dels senyals musicals polif貌nics complexes.Esta tesis propone m茅todos para tratar las limitaciones de las t茅cnicas existentes de separaci贸n de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los m茅todos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularizaci贸n de Tikhonov como m茅todo de descomposici贸n del espectro en el contexto de baja latencia. Lo comparamos con las t茅cnicas existentes en tareas de estimaci贸n y seguimiento de los tonos, que son pasos cruciales en muchos m茅todos de separaci贸n. A continuaci贸n utilizamos y evaluamos el m茅todo de descomposici贸n del espectro en tareas de separaci贸n de voz cantada, bajo y percusi贸n. En segundo lugar, proponemos varios m茅todos de alta latencia que mejoran la separaci贸n de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiraci贸n y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separaci贸n de los instrumentos de percusi贸n y se帽ales musicales polif贸nicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals

    Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music

    No full text
    The extraction of pitch information is arguably one of the most important tasks in automatic music description systems. However, previous research and evaluation datasets dealing with pitch estimation focused on relatively limited kinds of musical data. This work aims to broaden this scope by addressing symphonic western classical music recordings, focusing on pitch estimation for melody extraction. This material is characterized by a high number of overlapping sources, and by the fact that the melody may be played by different instrumental sections, often alternating within an excerpt. We evaluate the performance of eleven state-of-the-art pitch salience functions, multipitch estimation and melody extraction algorithms when determining the sequence of pitches corresponding to the main melody in a varied set of pieces. An important contribution of the present study is the proposed evaluation framework, including the annotation methodology, generated dataset and evaluation metrics. The results show that the assumptions made by certain methods hold better than others when dealing with this type of music signal, leading to a better performance. Additionally, we propose a simple method for combining the output of several algorithms, with promising results.This work is supported by the European Union Seventh Framework Programme FP7 / 2007-2013 through the PHENICX project (grant agreement no. 601166). This work is also (partly) supported by the Spanish Ministry of Economy and Competitiveness/nunder the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)

    What/when causal expectation modelling applied to audio signals

    No full text
    A causal system to represent a stream of music into musical events, and to generate further expected events, is presented. Starting from an auditory front-end that extracts low-level (i.e. MFCC) and mid-level features such as onsets and beats, an unsupervised clustering process builds and maintains a set of symbols aimed at representing musical stream events using both timbre and time descriptions. The time events are represented using inter-onset intervals relative to the beats. These symbols are then processed by an expectation module using Predictive Partial Match, a multiscale technique based on N-grams. To characterise the ability of the system to generate an expectation that matches both ground truth and system transcription, we introduce several measures that take into account the uncertainty associated with the unsupervised encoding of the musical sequence. The system is evaluated using a subset of the ENST-drums database of annotated drum recordings. We compare three approaches to combine timing (when) and timbre (what) expectation. In our experiments, we show that the induced representation is useful for generating expectation patterns in a causal way.This work is partially funded by the EmCAP project (European Commission FP6-IST, contract 013123)

    Computational models of music perception and cognition I: the perceptual and cognitive processing chain

    No full text
    We present a review on perception and cognition models designed for or applicable to music. An emphasis is put on computational implementations. We include findings from different disciplines: neuroscience, psychology, cognitive science, artificial intelligence, and musicology. The article summarizes the methodology that these disciplines use to approach the phenomena of music understanding, the localization of musical processes in the brain, and the flow of cognitive operations involved in turning physical signals into musical symbols, going from the transducers to the memory systems of the brain. We discuss formal models developed to emulate, explain and predict phenomena involved in early auditory processing, pitch processing, grouping, source separation, and music structure computation. We cover generic computational architectures of attention, memory, and expectation that can be instantiated and tuned to deal with specific musical phenomena. Criteria for the evaluation of such models are presented and discussed. Thereby, we lay out the general framework that provides the basis for the discussion of domain-specific music models in Part II.This work is funded by EU Open FET IST-FP6-013123 (EmCAP) and the Spanish TIC project ProSeMus (TIN2006-14932-C02-01). The first author also received support from a Juan de la Cierva scholarship from the Spanish Ministry of Education and Science. The third author is funded by the Austrian National Science Fund, FWF (project: P19349-N15)

    Computational models of music perception and cognition II: domain-specific music processing

    No full text
    In Part I [Purwins H, Herrera P, Grachten M, Hazan A, Marxer R, Serra X. Computational models of music perception and cognition I: The perceptual and cognitive processing chain. Physics of Life Reviews 2008, in press, doi:10.1016/j.plrev.2008.03.004], we addressed the study of cognitive processes that underlie auditory perception of music, and their neural correlates. The aim of the present paper is to summarize empirical findings from music cognition research that are relevant to three prominent music theoretic domains: rhythm, melody, and tonality. Attention is paid to how cognitive processes like category formation, stimulus grouping, and expectation can account for the music theoretic key concepts in these domains, such as beat, meter, voice, consonance. We give an overview of computational models that have been proposed in the literature for a variety of music processing tasks related to rhythm, melody, and tonality. Although the present state-of-the-art in computational modeling of music cognition definitely provides valuable resources for testing specific hypotheses and theories, we observe the need for models that integrate the various aspects of music perception and cognition into a single framework. Such models should be able to account for aspects that until now have only rarely been addressed in computational models of music cognition, like the active nature of perception and the development of cognitive capacities from infancy to adulthood.This work is funded by EU Open FET IST-FP6-013123 (EmCAP) and the Spanish TIC project ProSeMus (TIN2006-14932-C02- 01). The first author also received support from a Juan de la Cierva scholarship from the Spanish Ministry of Education and Science. The second author is funded by the Austrian National Science Fund, FWF (project: P19349-N15)

    Computational models of music perception and cognition I: the perceptual and cognitive processing chain

    No full text
    We present a review on perception and cognition models designed for or applicable to music. An emphasis is put on computational implementations. We include findings from different disciplines: neuroscience, psychology, cognitive science, artificial intelligence, and musicology. The article summarizes the methodology that these disciplines use to approach the phenomena of music understanding, the localization of musical processes in the brain, and the flow of cognitive operations involved in turning physical signals into musical symbols, going from the transducers to the memory systems of the brain. We discuss formal models developed to emulate, explain and predict phenomena involved in early auditory processing, pitch processing, grouping, source separation, and music structure computation. We cover generic computational architectures of attention, memory, and expectation that can be instantiated and tuned to deal with specific musical phenomena. Criteria for the evaluation of such models are presented and discussed. Thereby, we lay out the general framework that provides the basis for the discussion of domain-specific music models in Part II.This work is funded by EU Open FET IST-FP6-013123 (EmCAP) and the Spanish TIC project ProSeMus (TIN2006-14932-C02-01). The first author also received support from a Juan de la Cierva scholarship from the Spanish Ministry of Education and Science. The third author is funded by the Austrian National Science Fund, FWF (project: P19349-N15)

    What/when causal expectation modelling applied to audio signals

    No full text
    A causal system to represent a stream of music into musical events, and to generate further expected events, is presented. Starting from an auditory front-end that extracts low-level (i.e. MFCC) and mid-level features such as onsets and beats, an unsupervised clustering process builds and maintains a set of symbols aimed at representing musical stream events using both timbre and time descriptions. The time events are represented using inter-onset intervals relative to the beats. These symbols are then processed by an expectation module using Predictive Partial Match, a multiscale technique based on N-grams. To characterise the ability of the system to generate an expectation that matches both ground truth and system transcription, we introduce several measures that take into account the uncertainty associated with the unsupervised encoding of the musical sequence. The system is evaluated using a subset of the ENST-drums database of annotated drum recordings. We compare three approaches to combine timing (when) and timbre (what) expectation. In our experiments, we show that the induced representation is useful for generating expectation patterns in a causal way.This work is partially funded by the EmCAP project (European Commission FP6-IST, contract 013123)

    Computational models of music perception and cognition II: domain-specific music processing

    No full text
    In Part I [Purwins H, Herrera P, Grachten M, Hazan A, Marxer R, Serra X. Computational models of music perception and cognition I: The perceptual and cognitive processing chain. Physics of Life Reviews 2008, in press, doi:10.1016/j.plrev.2008.03.004], we addressed the study of cognitive processes that underlie auditory perception of music, and their neural correlates. The aim of the present paper is to summarize empirical findings from music cognition research that are relevant to three prominent music theoretic domains: rhythm, melody, and tonality. Attention is paid to how cognitive processes like category formation, stimulus grouping, and expectation can account for the music theoretic key concepts in these domains, such as beat, meter, voice, consonance. We give an overview of computational models that have been proposed in the literature for a variety of music processing tasks related to rhythm, melody, and tonality. Although the present state-of-the-art in computational modeling of music cognition definitely provides valuable resources for testing specific hypotheses and theories, we observe the need for models that integrate the various aspects of music perception and cognition into a single framework. Such models should be able to account for aspects that until now have only rarely been addressed in computational models of music cognition, like the active nature of perception and the development of cognitive capacities from infancy to adulthood.This work is funded by EU Open FET IST-FP6-013123 (EmCAP) and the Spanish TIC project ProSeMus (TIN2006-14932-C02- 01). The first author also received support from a Juan de la Cierva scholarship from the Spanish Ministry of Education and Science. The second author is funded by the Austrian National Science Fund, FWF (project: P19349-N15)
    corecore